54 research outputs found

    Affordable Person Detection in Omnidirectional Cameras Using Radial Integral Channel Features

    No full text
    Omnidirectional cameras cover more ground than perspective cameras, at the expense of resolution. Their comprehensive field of view makes omnidirectional cameras appealing for security and ambient intelligence applications. Person detection is usually a core part of such applications. Conventional methods fail for omnidirectional images due to different image geometry and formation. In this study, we propose a method for person detection in omnidirectional images, which is based on the integral channel features approach. Features are extracted from various channels, such as LUV and gradient magnitude, and classified using boosted decision trees. Features are pixel sums inside annular sectors (doughnut slice shapes) contained by the detection window. We also propose a novel data structure called radial integral image that allows to calculate sums inside annular sectors efficiently. We have shown with experiments that our method outperforms the previous state of the art and uses significantly less computational resources

    Designing Computational Tools for Behavioral and Clinical Science

    No full text
    Automatic analysis of human affective and social signals brought computer science closer to social sciences and, in particular, enabled collaborations between computer scientists and behavioral scientists. In this talk, I highlight the main research areas in this burgeoning interdisciplinary area, and provide an overview of the opportunities and challenges. Drawing on examples from our recent research, such as automatic analysis of interactive play therapy sessions with children, and diagnosis of bipolar disorder from multimodal cues, as well as relying on examples from the growing literature, I explore the potential of human-AI collaboration, where AI systems do not replace, but support monitoring and human decision making in behavioral and clinical sciences

    Selected Works From Automated Face and Gesture Recognition 2020

    No full text
    The 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) was held online between 16 and 20 November 2020. The IEEE conference series on Automatic Face and Gesture Recognition is the premier international forum for research in image and video-based face, gesture, and body movement recognition. The program chairs of FG 2020 invited the authors of outstanding papers from over 75 accepted papers to submit extended versions of their work to a special issue of the IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR AND IDENTITY SCIENCE, based on topical suitability, reviewer scores, and area chair comments. These submissions went through the normal peer-review process at TBIOM—including in some instances substantial further revision and improvement—leading to the set of three papers appearing in this issue

    Comparing Approaches for Explaining DNN-Based Facial Expression Classifications

    No full text
    Classifying facial expressions is a vital part of developing systems capable of aptly interacting with users. In this field, the use of deep-learning models has become the standard. However, the inner workings of these models are unintelligible, which is an important issue when deploying them to high-stakes environments. Recent efforts to generate explanations for emotion classification systems have been focused on this type of models. In this work, an alternative way of explaining the decisions of a more conventional model based on geometric features is presented. We develop a geometric-features-based deep neural network (DNN) and a convolutional neural network (CNN). Ensuring a sufficient level of predictive accuracy, we analyze explainability using both objective quantitative criteria and a user study. Results indicate that the fidelity and accuracy scores of the explanations approximate the DNN well. From the performed user study, it becomes clear that the explanations increase the understanding of the DNN and that they are preferred over the explanations for the CNN, which are more commonly used. All scripts used in the study are publicly available

    Modeling Short-Term and Long-Term Dependencies of the Speech Signal for Paralinguistic Emotion Classification

    No full text
    Recently, Speech Emotion Recognition (SER) has become an important research topic of affective computing. It is a difficult problem, where some of the greatest challenges lie in the feature selection and representation tasks. A good feature representation should be able to reflect global trends as well as temporal structure of the signal, since emotions naturally evolve in time; it has become possible with the advent of Recurrent Neural Networks (RNN), which are actively used today for various sequence modeling tasks. This paper proposes a hybrid approach to feature representation, which combines traditionally engineered statistical features with Long Short-Term Memory (LSTM) sequence representation in order to take advantage of both short-term and long-term acoustic characteristics of the signal, therefore capturing not only the general trends but also temporal structure of the signal. The evaluation of the proposed method is done on three publicly available acted emotional speech corpora in three different languages, namely RUSLANA (Russian speech), BUEMODB (Turkish speech) and EMODB (German speech). Compared to the traditional approach, the results of our experiments show an absolute improvement of 2.3% and 2.8% for two out of three databases, and a comparative performance on the third. Therefore, provided enough training data, the proposed method proves effective in modelling emotional content of speech utterances

    Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS

    No full text
    Predictive emission monitoring systems (PEMS) are important tools for validation and backing up of costlycontinuous emission monitoring systems used in gas-turbine-based power plants. Their implementation relies on theavailability of appropriate and ecologically valid data. In this paper, we introduce a novel PEMS dataset collected overfive years from a gas turbine for the predictive modeling of the CO and NOxemissions. We analyze the data using arecent machine learning paradigm, and present useful insights about emission predictions. Furthermore, we present abenchmark experimental procedure for comparability of future works on the data

    Refining Activation Downsampling With SoftPool

    No full text
    Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial to increase the receptive fields and to reduce computational requirements of subsequent convolutions. An important feature of the pooling operation is the minimization of information loss, with respect to the initial activation maps, without a significant impact on the computation and memory overhead. To meet these requirements, we propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling. Through experiments across a range of architectures and pooling methods, we demonstrate that SoftPool can retain more information in the reduced activation maps. This refined downsampling leads to improvements in a CNN’s classification accuracy. Experiments with pooling layer substitutions on ImageNet1K show an increase in accuracy over both original architectures and other pooling methods. We also test SoftPool on video datasets for action recognition. Again, through the direct replacement of pooling layers, we observe consistent performance improvements while computational loads and memory requirements remain limited

    Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework

    No full text
    In this paper, we present a hierarchical framework for complex paralinguistic analysis of speech including gender, emotions and deception recognition. The main idea of the framework is built upon the research on interrelation between various paralinguistic phenomena. It uses gender information to predict emotional states, and the outcome of the emotion recognition to predict the truthfulness of the speech. We use multiple datasets (aGender, Ruslana, EmoDB and DSD) to perform within-corpus and cross-corpus experiments using various performance measures. The experimental results reveal that gender-specific models improve the effectiveness of automatic speech emotion recognition in terms of Unweighted Average Recall up to an absolute 5.7%, and the integration of emotion predictions improves the F-score of automatic deception detection compared to our baseline by an absolute 4.7%. The obtained cross-validation results of 88.4 +/- 1.5% for deception detection beat the existing state-of-the-art by an absolute 2.8%

    Fully-attentive and interpretable: vision and video vision transformers for pain detection

    No full text
    Pain is a serious and costly issue globally, but to be treated, it must first be detected. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain detection. In this paper, we propose the first fully-attentive automated pain detection pipeline that achieves state-of-the-art performance on binary pain detection from facial expressions. The model is trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining 3 noteworthy models. We analyse the attention maps of one of our models, finding reasonable interpretations for its predictions. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer, with no success. Our presented models, ViT-1 (F1 score 0.55 +- 0.15), ViViT-1 (F1 score 0.55 +- 0.13), and ViViT-2 (F1 score 0.49 +- 0.04), all outperform earlier works, showing the potential of vision transformers for pain detection
    • …
    corecore